Skip to content
This repository was archived by the owner on Jan 23, 2026. It is now read-only.

Add eval runner support #194

Merged
potofpie merged 28 commits intomainfrom
eval-handler
Oct 22, 2025
Merged

Add eval runner support #194
potofpie merged 28 commits intomainfrom
eval-handler

Conversation

@potofpie
Copy link
Copy Markdown
Member

@potofpie potofpie commented Oct 6, 2025

Summary by CodeRabbit

  • New Features

    • Added an evaluation system with run/record capabilities and a background eval job scheduler.
    • Exposed SHA-256 hash utilities (async and sync) in the public API.
    • Prompt metadata now includes eval references for end-to-end eval tracking.
  • Bug Fixes

    • Improved error reporting and diagnostics when loading prompts.
  • Tests

    • Added unit tests for the new hash utilities.
  • Chores

    • Initialized eval scheduler at startup and expanded public exports.
    • Added .DS_Store to .gitignore.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 6, 2025

Walkthrough

Adds an evaluation subsystem (EvalAPI, EvalJobScheduler), integrates eval execution into request/session lifecycle and server startup, introduces SHA-256 hash utilities and tests, propagates eval identifiers through prompt metadata, and includes minor infra/logging and .gitignore updates.

Changes

Cohort / File(s) Summary
Project Configuration
\.gitignore
Adds .DS_Store to ignored files
Package Metadata
package.json
Version bumped 0.0.156 → 0.0.157; no other substantive changes
Service API Extension
src/apis/api.ts
Extends ServiceName with 'eval' and adds base-url logic preferring AGENTUITY_EVAL_URL (fallback to transport URL)
New Eval API
src/apis/eval.ts
New EvalAPI export with types/interfaces and methods to discover evals (by name/id), parse embedded metadata, dynamically import eval modules, execute evals, collect results, and POST run completion
Eval Job Scheduler
src/apis/evaljobscheduler.ts
New singleton EvalJobScheduler export managing PendingEvalJob entries with createJob, removeJob, getJobs, printState, and cross-context singleton initialization; exports PendingEvalJob and JobFilter
Prompt Handling
src/apis/prompt/index.ts
Adds per-path existence checks, per-path try/catch recording attemptErrors, richer error reporting when all paths fail, and finer-grained debug logs
Public Exports
src/index.ts
Re-exports EvalAPI, EvalJobScheduler, hash, hashSync; expands public API surface and imports new modules
Router Integration
src/router/context.ts
Adds private executeEvalsForSession and executeEvalsForJob to process pending eval jobs post-session completion; integrates EvalAPI and EvalJobScheduler and adds logging
Router Control Flow
src/router/router.ts
Changes router then-callback to async and explicitly awaits contextHandler.waitUntilAll, with logging before/after await
Server Initialization (Bun)
src/server/bun.ts
Initializes global EvalJobScheduler on server start; consolidates imports
Server Initialization (Node)
src/server/node.ts
Initializes global EvalJobScheduler before server creation; imports isIdle for idle checks
Hash Utilities
src/utils/hash.ts
New async hash(value: string): Promise<string> (Web Crypto) and sync hashSync(value: string): string (Node crypto) with ArrayBuffer→hex helper
Prompt Metadata
src/utils/promptMetadata.ts
Adds evals?: string[] to PromptAttributesParams, switches template/compiled hashing to hashSync, and propagates evals through metadata and storage
Hash Tests
test/utils/hash.test.ts
New unit tests verifying async/sync hash parity, consistency, distinctness, and empty-string handling
PatchPortal tweak
src/apis/patchportal.ts
Random segment generation changed from substr(2,9) to substring(2,9) (affects instanceId only)
Changelog
CHANGELOG.md
Adds v0.0.157 entry: "Add support for eval running"

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Router
    participant ContextHandler
    participant EvalJobScheduler
    participant EvalAPI
    participant EvalModule
    rect #f0f7ff
    Router->>ContextHandler: handle request -> waitUntilAll(sessionId)
    ContextHandler->>ContextHandler: await pending promises
    end
    ContextHandler->>EvalJobScheduler: getJobs(sessionId)
    EvalJobScheduler-->>ContextHandler: [PendingEvalJob...]
    loop Per job
        ContextHandler->>EvalAPI: loadEvalMetadataMap()
        EvalAPI-->>ContextHandler: Map<slug,id>
        loop Per prompt in job
            ContextHandler->>EvalAPI: runEval(evalName,input,output,sessionId,spanId)
            EvalAPI->>EvalModule: dynamic import (file URL)
            EvalModule-->>EvalAPI: EvalFunction
            EvalAPI->>EvalModule: invoke eval with EvalContext
            EvalModule-->>EvalAPI: EvalRunResult
            EvalAPI->>EvalAPI: POST /evalrun/finish
            EvalAPI-->>ContextHandler: CreateEvalRunResponse
        end
        ContextHandler->>EvalJobScheduler: removeJob(spanId)
    end
    ContextHandler->>ContextHandler: mark session completed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Add prompt meta data #193 — Modifies prompt metadata and PatchPortal-related logic; strong overlap with prompt metadata and PatchPortal changes in this PR.

Poem

🐰 I hopped through code with nimble feet,

New evals to find and jobs to meet,
Hashes hum secrets, tidy and neat,
Sessions close — then evaluations repeat,
A carrot-cheer for builds complete 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check ✅ Passed The pull request title "Add eval runner support" accurately describes the primary objective of the changeset. The PR introduces comprehensive evaluation running capability through multiple new components: the EvalAPI class for loading and executing evals, the EvalJobScheduler singleton for managing evaluation jobs, supporting utilities like SHA-256 hashing, and integration points in the router and server initialization. The title is clear, concise, and specific enough that a teammate reviewing the PR history would immediately understand this adds eval execution functionality. It avoids vague terminology and directly conveys the main enhancement.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch eval-handler

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)
  • SHA-256: Entity not found: Issue - Could not find referenced Issue.

Comment @coderabbitai help to get the list of available commands and usage tips.

@potofpie potofpie changed the base branch from main to add-prompt-meta-data October 6, 2025 14:14
Base automatically changed from add-prompt-meta-data to main October 8, 2025 18:35
Copy link
Copy Markdown
Member

@jhaynie jhaynie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to think about idle detection so we dont suspend process while running evals.

Comment thread package.json Outdated
"esbuild": ">=0.25.0"
},
"dependencies": {
"@clickhouse/client": "^1.0.0",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whaaat?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah lol my bad need to remove this

Comment thread src/server/bun.ts Outdated

const url = new URL(req.url);

// Handle eval routes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just make it a normal route handler

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the route stuff

@potofpie potofpie marked this pull request as ready for review October 21, 2025 20:52
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/apis/prompt/index.ts (1)

95-121: Use CommonJS require() for generated module loading (per project rules)

This loader uses dynamic ESM import with a file URL. Our guidelines mandate require() + fs.existsSync + try/catch for bundler compatibility and runtime loading.

Apply this diff (switch to createRequire + existsSync, preserve cache‑bust via require.cache):

-import fs from 'fs/promises';
+import fs from 'fs/promises';
+import fsSync from 'fs';
 import path from 'path';
-import { pathToFileURL } from 'url';
+import { createRequire } from 'module';
@@
 	public async loadPrompts(): Promise<void> {
@@
-			internal.debug('Trying absolute paths:', possiblePaths);
-			const attemptErrors: string[] = [];
-			for (const possiblePath of possiblePaths) {
-				internal.debug('  Checking:', possiblePath);
-				try {
-					await fs.access(possiblePath);
-					internal.debug('  ✓ File exists:', possiblePath);
-
-					// Get file stats for cache-busting
-					const stats = await fs.stat(possiblePath);
-					const mtime = stats.mtime.getTime();
-
-					// Convert to file URL with cache-busting query param
-					const fileUrl = pathToFileURL(possiblePath).href + `?t=${mtime}`;
-					internal.debug('  Importing from:', fileUrl);
-
-					// Use ESM dynamic import instead of require
-					generatedModule = await import(fileUrl);
-					internal.debug('  ✓ Successfully loaded from:', possiblePath);
-					break;
-				} catch (error) {
-					const errorMsg =
-						error instanceof Error ? error.message : String(error);
-					internal.debug(`  ✗ Failed to load ${possiblePath}: ${errorMsg}`);
-					attemptErrors.push(`${possiblePath}: ${errorMsg}`);
-				}
-			}
+			internal.debug('Trying absolute paths:', possiblePaths);
+			const attemptErrors: string[] = [];
+			const require = createRequire(import.meta.url);
+			for (const possiblePath of possiblePaths) {
+				internal.debug('  Checking:', possiblePath);
+				try {
+					if (!fsSync.existsSync(possiblePath)) {
+						internal.debug('  ✗ File not found:', possiblePath);
+						attemptErrors.push(`${possiblePath}: not found`);
+						continue;
+					}
+					internal.debug('  ✓ File exists:', possiblePath);
+					// Cache-bust require on mtime
+					try {
+						const stats = await fs.stat(possiblePath);
+						const resolved = require.resolve(possiblePath);
+						const cached = require.cache?.[resolved];
+						if (cached && (cached as any).loaded) {
+							delete require.cache[resolved];
+							internal.debug('  ↻ Cleared require cache:', resolved, stats.mtimeMs);
+						}
+					} catch {
+						/* ignore cache clear errors */
+					}
+					generatedModule = require(possiblePath);
+					internal.debug('  ✓ Successfully loaded from:', possiblePath);
+					break;
+				} catch (error) {
+					const errorMsg =
+						error instanceof Error ? error.message : String(error);
+					internal.debug(`  ✗ Failed to load ${possiblePath}: ${errorMsg}`);
+					attemptErrors.push(`${possiblePath}: ${errorMsg}`);
+				}
+			}
@@
-				throw new Error(
-					`Generated prompts file not found. Tried:\n${attemptErrors.join('\n')}`
-				);
+				throw new Error(
+					`Generated prompts file not found. Tried:\n${attemptErrors.join('\n')}`
+				);

As per coding guidelines / Based on learnings.

src/utils/promptMetadata.ts (1)

18-18: Remove stray expression statement [];

This no-op is likely accidental and can confuse readers.

Apply:

-[];
🧹 Nitpick comments (18)
src/server/internal-routes.ts (3)

16-18: Match '/_agentuity' with or without trailing slash and strip query.

Current check misses '/_agentuity' and paths with query. Parse pathname first.

-	isInternalRoute(url: string): boolean {
-		return url.startsWith('/_agentuity/');
-	}
+	isInternalRoute(url: string): boolean {
+		const pathname = url.startsWith('http')
+			? new URL(url).pathname
+			: url.split('?')[0];
+		return pathname === '/_agentuity' || pathname.startsWith('/_agentuity/');
+	}

31-32: Return structured 404 with content-type.

Helps clients and avoids text bodies.

-		return new Response('Internal route not found', { status: 404 });
+		return new Response(JSON.stringify({ error: 'not_found' }), {
+			status: 404,
+			headers: { 'content-type': 'application/json' },
+		});

9-11: Keep the logger for future, but store it for consistency.

Small polish; no behavior change.

-export class InternalRoutesHandler {
-	constructor(_logger: Logger) {
-		// Logger parameter kept for future use
-	}
+export class InternalRoutesHandler {
+	constructor(private readonly logger: Logger) {}
src/apis/evaljobscheduler.ts (3)

30-35: Use slice instead of deprecated substr.

Avoid deprecated String.prototype.substr.

- this.instanceId = `EvalJobScheduler-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
+ this.instanceId = `EvalJobScheduler-${Date.now()}-${Math.random()
+  .toString(36)
+  .slice(2, 11)}`;

40-60: getInstance doesn’t need to be async.

No awaits present; returning Promise adds overhead for callers.

- public static async getInstance(): Promise<EvalJobScheduler> {
+ public static getInstance(): EvalJobScheduler {
   // ...unchanged...
-  return globalThis.__evalJobSchedulerInstance;
+  return globalThis.__evalJobSchedulerInstance!;
 }

Note: adjust call sites accordingly.


99-104: Consider TTL or max-capacity to prevent unbounded growth.

Add eviction (e.g., max jobs per session, time-based cleanup).

src/apis/eval.ts (6)

248-289: Validate and normalize score/pass results before emitting.

  • Clamp score to [0,1].
  • Keep comment and behavior aligned.
- score: (
-  val: number,
+ score: (
+  val: number,
   meta?: { reasoning?: string; [key: string]: unknown }
 ) => {
+  const score = Math.max(0, Math.min(1, val));
   createEvalRunRequest = {
     projectId,
     sessionId,
     spanId,
     result: {
       success: true,
-      score: val,
+      score,
       metadata: {
         reason: meta?.reasoning || '',
         ...meta,
       },
     },
     evalId: metadata.id,
     promptHash,
   };
 },

225-233: Guard missing projectId.

If AGENTUITY_CLOUD_PROJECT_ID is unset, fail fast or warn loudly.

- const projectId = process.env.AGENTUITY_CLOUD_PROJECT_ID || '';
+ const projectId = process.env.AGENTUITY_CLOUD_PROJECT_ID || '';
+ if (!projectId) {
+   internal.warn('AGENTUITY_CLOUD_PROJECT_ID is not set; eval runs may not persist.');
+ }

334-335: Use slice instead of deprecated substr in ID generation.

- id: `local-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`,
+ id: `local-${Date.now()}-${Math.random().toString(36).slice(2, 11)}`,

410-417: Make metadata regex resilient to type annotations.

Handles export const metadata: Type = { ... }.

- const metadataRegex = /export\s+const\s+metadata\s*=\s*\{/;
+ const metadataRegex =
+  /export\s+const\s+metadata(?:\s*:\s*[^=]+)?\s*=\s*\{/;

445-463: String-munging JSON is brittle.

Quotes in values, nested objects, or keys with hyphens will break. Prefer importing the module and reading metadata or use a tiny parser for the export.

Would you like me to replace this with a safe static analysis using a parser (e.g., acorn) or switch to dynamic import for metadata only?


117-152: Sequential dynamic imports may be slow for many evals.

Optional: parallelize with Promise.allSettled on filtered files, then pick the first match.

Also applies to: 170-205

src/router/router.ts (1)

428-438: Re-check pending header semantics.

Since you now await waitUntilAll, hasPending() should usually be false; consider removing the header or gating wait by config.

Would you like me to add a config flag (e.g., awaitWaitUntilAll: boolean) to preserve previous non-blocking behavior?

src/server/bun.ts (2)

65-67: Safer global leakage for internal logger

Avoid a plain string property on globalThis. Prefer a Symbol or a unique namespace object to reduce collision risk.

Apply this diff:

-	(globalThis as any).__agentuityInternalLogger = logger;
+	const kInternalLogger = Symbol.for('@agentuity/internalLogger');
+	(globalThis as any)[kInternalLogger] = logger;

178-191: Internal routes: add CORS, generate runId, and guard body

  • Add '*' CORS header for parity with Node handler.
  • Pass a generated runId (helps downstream logging/tracing).
  • Guard req.body (can be null in Bun).

Apply this diff:

-				// Handle internal routes first
-				if (url.pathname.startsWith('/_agentuity/')) {
-					const internalResponse = await internalRoutes.handleInternalRoute({
-						method,
-						url: url.pathname,
-						headers: req.headers.toJSON(),
-						body: req.body as unknown as ReadableStream<ReadableDataType>,
-						request: getRequestFromHeaders(req.headers.toJSON(), ''),
-						setTimeout: (_val: number) => void 0,
-					});
-					if (internalResponse) {
-						return internalResponse;
-					}
-				}
+				// Handle internal routes first
+				if (url.pathname.startsWith('/_agentuity/')) {
+					const runId =
+						globalThis.crypto?.randomUUID?.() ??
+						Date.now().toString(36);
+					const internalResponse = await internalRoutes.handleInternalRoute({
+						method,
+						url: url.pathname,
+						headers: req.headers.toJSON(),
+						body:
+							(req.body ??
+								undefined) as unknown as ReadableStream<ReadableDataType>,
+						request: getRequestFromHeaders(req.headers.toJSON(), runId),
+						setTimeout: (_val: number) => void 0,
+					});
+					if (internalResponse) {
+						const headers = new Headers(internalResponse.headers);
+						if (!headers.has('Access-Control-Allow-Origin')) {
+							headers.set('Access-Control-Allow-Origin', '*');
+						}
+						return new Response(internalResponse.body, {
+							status: internalResponse.status,
+							headers,
+						});
+					}
+				}
src/index.ts (1)

1-1: ESM path consistency for bundler/NodeNext

New imports/exports should include “.js” to match existing style (e.g., promptMetadata, patchportal) and avoid runtime resolution issues under node16/nodenext.

Apply:

-export * from './apis/eval';
+export * from './apis/eval.js';
@@
-export { hash, hashSync } from './utils/hash';
+export { hash, hashSync } from './utils/hash.js';
@@
-import EvalAPI from './apis/eval';
-import EvalJobScheduler from './apis/evaljobscheduler';
+import EvalAPI from './apis/eval.js';
+import EvalJobScheduler from './apis/evaljobscheduler.js';

As per coding guidelines.

Also applies to: 5-5, 16-17

src/utils/promptMetadata.ts (1)

75-83: Unify hashing and fix typos (comiledPrompts) in concat path

  • Use hashSync instead of direct crypto for consistency.
  • Rename comiledPrompts → compiledPrompts.
  • Drop the unused crypto import afterwards.

Apply:

-export async function processPromptMetadataConcat(
-	comiledPrompts: string[]
-): Promise<string> {
-	const compiledPromptsString = comiledPrompts.join('');
+export async function processPromptMetadataConcat(
+	compiledPrompts: string[]
+): Promise<string> {
+	const compiledPromptsString = compiledPrompts.join('');
@@
-		comiledPrompts.map(async (compiledPrompt) => {
+		compiledPrompts.map(async (compiledPrompt) => {
@@
-	const newCompiledHash = crypto
-		.createHash('sha256')
-		.update(comiledPrompts.join(''))
-		.digest('hex');
+	const newCompiledHash = hashSync(compiledPrompts.join(''));

Additionally remove the top-level crypto import if now unused:

-import crypto from 'crypto';

As per coding guidelines.

Also applies to: 97-101

src/router/context.ts (1)

134-139: Validate eval slugs using loaded metadata

You load evalMetadataMap but never use it. Validate slugs to avoid futile runEval calls and clearer logs.

Apply:

@@
-			const evalMetadataMap = await evalAPI.loadEvalMetadataMap();
+			const evalMetadataMap = await evalAPI.loadEvalMetadataMap();
 			logger.info(`📚 Loaded ${evalMetadataMap.size} eval mappings`);
@@
-			for (const evalSlug of promptMeta.evals) {
+			for (const evalSlug of promptMeta.evals) {
+				if (!evalMetadataMap.has(evalSlug)) {
+					logger.warn(
+						`⏭️  Unknown eval '${evalSlug}' — skipping (not in metadata map)`
+					);
+					continue;
+				}
 				try {

Also applies to: 205-206

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5bd84ff and cba15da.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (15)
  • .gitignore (1 hunks)
  • package.json (2 hunks)
  • src/apis/api.ts (3 hunks)
  • src/apis/eval.ts (1 hunks)
  • src/apis/evaljobscheduler.ts (1 hunks)
  • src/apis/prompt/index.ts (1 hunks)
  • src/index.ts (3 hunks)
  • src/router/context.ts (2 hunks)
  • src/router/router.ts (1 hunks)
  • src/server/bun.ts (3 hunks)
  • src/server/internal-routes.ts (1 hunks)
  • src/server/node.ts (3 hunks)
  • src/utils/hash.ts (1 hunks)
  • src/utils/promptMetadata.ts (3 hunks)
  • test/utils/hash.test.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (10)
{src,test}/**/!(*.d).ts

📄 CodeRabbit inference engine (AGENT.md)

{src,test}/**/!(*.d).ts: Use strict TypeScript and prefer unknown over any
Use ESM import/export syntax; avoid CommonJS require/module.exports
Use relative imports for internal modules
Keep imports organized (sorted, no unused imports)
Use tabs with a visual width of 2 spaces
Limit lines to a maximum of 80 characters
Use single quotes for strings
Use proper Error types; do not throw strings
Prefer template literals over string concatenation

Files:

  • src/utils/hash.ts
  • src/router/router.ts
  • src/apis/prompt/index.ts
  • src/server/internal-routes.ts
  • src/utils/promptMetadata.ts
  • src/apis/eval.ts
  • src/server/bun.ts
  • src/apis/evaljobscheduler.ts
  • src/server/node.ts
  • src/apis/api.ts
  • test/utils/hash.test.ts
  • src/index.ts
  • src/router/context.ts
src/router/**

📄 CodeRabbit inference engine (AGENT.md)

Router code (request/response handling, streaming) resides in src/router/

Files:

  • src/router/router.ts
  • src/router/context.ts
src/apis/**

📄 CodeRabbit inference engine (AGENT.md)

Place core API implementations under src/apis/ (email, discord, keyvalue, vector, objectstore)

Files:

  • src/apis/prompt/index.ts
  • src/apis/eval.ts
  • src/apis/evaljobscheduler.ts
  • src/apis/api.ts
src/apis/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/code-generation.mdc)

src/apis/**/*.ts: Do not hardcode generated prompt content (e.g., copyWriter) in source files; load it dynamically
Avoid overly complex TypeScript generics for generated content; prefer simple, maintainable types
Maintain type safety for dynamically loaded content by generating and referencing TypeScript definitions, with proper annotations for require() results
Do not use relative imports to generated artifacts; resolve via absolute node_modules paths or the package entry
Generated content is loaded at runtime, not build time; avoid static imports of generated modules
Prefer bracket notation for accessing slug-named properties with hyphens (e.g., prompts['slug-name'])
Avoid relative require('./generated/_index.js'); resolve absolute paths from process.cwd() into node_modules for generated assets

Files:

  • src/apis/prompt/index.ts
  • src/apis/eval.ts
  • src/apis/evaljobscheduler.ts
  • src/apis/api.ts
src/apis/prompt/index.ts

📄 CodeRabbit inference engine (.cursor/rules/code-generation.mdc)

src/apis/prompt/index.ts: Use dynamic loading with absolute paths; check dist first then src, and provide fallbacks when generated content is missing
Use require() (CommonJS) to load generated modules at runtime rather than import(), ensuring bundler compatibility
When loading generated modules, guard with fs.existsSync and wrap require() in try/catch; default to safe content on failure

Files:

  • src/apis/prompt/index.ts
src/apis/prompt/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/code-generation.mdc)

src/apis/prompt/**/*.ts: In TS interfaces that include hyphenated slugs, quote property names (e.g., 'slug-name') and access via bracket notation
Generated content must import from @agentuity/sdk rather than relative paths

Files:

  • src/apis/prompt/index.ts
src/server/*.ts

📄 CodeRabbit inference engine (.cursor/rules/code-generation.mdc)

src/server/*.ts: Load generated content during server context creation and provide fallbacks when signatures or content are missing
In compile helpers, prefer using generated signature functions when present and provide a manual compilation fallback when absent

Files:

  • src/server/internal-routes.ts
  • src/server/bun.ts
  • src/server/node.ts
src/server/{server,bun,node,agents}.ts

📄 CodeRabbit inference engine (AGENT.md)

Server components live in src/server/ as server.ts, bun.ts, node.ts, and agents.ts

Files:

  • src/server/bun.ts
  • src/server/node.ts
test/**

📄 CodeRabbit inference engine (AGENT.md)

Tests must mirror the source structure under the test/ directory

Files:

  • test/utils/hash.test.ts
src/index.ts

📄 CodeRabbit inference engine (AGENT.md)

src/index.ts must be the entry point and export server, logger, types, and APIs

Always export interpolateTemplate from the main SDK index

Files:

  • src/index.ts
🧠 Learnings (7)
📚 Learning: 2025-09-24T18:33:34.725Z
Learnt from: jhaynie
PR: agentuity/sdk-js#179
File: src/router/context.ts:36-58
Timestamp: 2025-09-24T18:33:34.725Z
Learning: In the AgentContextWaitUntilHandler class in src/router/context.ts, the running counter should only be decremented in waitUntilAll, not when individual promises complete. The design intent is that running represents work that hasn't been fully processed as a batch, maintaining the collective nature of the wait-until pattern where a session isn't done until all deferred work is processed together.

Applied to files:

  • src/router/router.ts
📚 Learning: 2025-10-03T22:20:06.957Z
Learnt from: CR
PR: agentuity/sdk-js#0
File: .cursor/rules/code-generation.mdc:0-0
Timestamp: 2025-10-03T22:20:06.957Z
Learning: Applies to src/apis/prompt/index.ts : When loading generated modules, guard with fs.existsSync and wrap require() in try/catch; default to safe content on failure

Applied to files:

  • src/apis/prompt/index.ts
📚 Learning: 2025-10-03T22:20:06.957Z
Learnt from: CR
PR: agentuity/sdk-js#0
File: .cursor/rules/code-generation.mdc:0-0
Timestamp: 2025-10-03T22:20:06.957Z
Learning: Applies to src/apis/prompt/index.ts : Use dynamic loading with absolute paths; check dist first then src, and provide fallbacks when generated content is missing

Applied to files:

  • src/apis/prompt/index.ts
📚 Learning: 2025-10-03T22:20:06.957Z
Learnt from: CR
PR: agentuity/sdk-js#0
File: .cursor/rules/code-generation.mdc:0-0
Timestamp: 2025-10-03T22:20:06.957Z
Learning: Applies to src/apis/prompt/index.ts : Use require() (CommonJS) to load generated modules at runtime rather than import(), ensuring bundler compatibility

Applied to files:

  • src/apis/prompt/index.ts
📚 Learning: 2025-10-03T22:20:06.957Z
Learnt from: CR
PR: agentuity/sdk-js#0
File: .cursor/rules/code-generation.mdc:0-0
Timestamp: 2025-10-03T22:20:06.957Z
Learning: Applies to src/apis/**/*.ts : Do not hardcode generated prompt content (e.g., copyWriter) in source files; load it dynamically

Applied to files:

  • src/apis/prompt/index.ts
📚 Learning: 2025-10-03T22:20:06.957Z
Learnt from: CR
PR: agentuity/sdk-js#0
File: .cursor/rules/code-generation.mdc:0-0
Timestamp: 2025-10-03T22:20:06.957Z
Learning: Applies to src/apis/prompt/**/*.ts : Generated content must import from agentuity/sdk rather than relative paths

Applied to files:

  • src/apis/prompt/index.ts
📚 Learning: 2025-10-03T22:20:06.957Z
Learnt from: CR
PR: agentuity/sdk-js#0
File: .cursor/rules/code-generation.mdc:0-0
Timestamp: 2025-10-03T22:20:06.957Z
Learning: Applies to src/apis/**/*.ts : Avoid relative require('./generated/_index.js'); resolve absolute paths from process.cwd() into node_modules for generated assets

Applied to files:

  • src/apis/prompt/index.ts
🧬 Code graph analysis (11)
src/utils/hash.ts (2)
src/router/data.ts (1)
  • arrayBuffer (188-191)
src/index.ts (2)
  • hash (5-5)
  • hashSync (5-5)
src/router/router.ts (2)
src/logger/index.ts (1)
  • logger (3-3)
src/logger/user.ts (1)
  • logger (8-8)
src/apis/prompt/index.ts (1)
src/logger/internal.ts (2)
  • internal (178-194)
  • error (128-132)
src/server/internal-routes.ts (1)
src/server/types.ts (1)
  • ServerRequest (23-31)
src/utils/promptMetadata.ts (2)
src/apis/patchportal.ts (1)
  • PatchPortal (11-125)
src/utils/hash.ts (1)
  • hashSync (27-29)
src/apis/eval.ts (1)
src/apis/api.ts (1)
  • POST (254-271)
src/server/bun.ts (4)
src/apis/evaljobscheduler.ts (1)
  • EvalJobScheduler (24-173)
src/server/internal-routes.ts (1)
  • InternalRoutesHandler (8-33)
src/types.ts (1)
  • ReadableDataType (134-139)
src/server/util.ts (1)
  • getRequestFromHeaders (337-360)
src/apis/evaljobscheduler.ts (1)
src/index.ts (2)
  • EvalJobScheduler (25-25)
  • PromptAttributes (8-8)
src/server/node.ts (3)
src/apis/evaljobscheduler.ts (1)
  • EvalJobScheduler (24-173)
src/server/internal-routes.ts (1)
  • InternalRoutesHandler (8-33)
src/server/util.ts (1)
  • getRequestFromHeaders (337-360)
test/utils/hash.test.ts (2)
src/index.ts (2)
  • hash (5-5)
  • hashSync (5-5)
src/utils/hash.ts (2)
  • hash (15-21)
  • hashSync (27-29)
src/router/context.ts (4)
src/apis/session.ts (1)
  • markSessionCompleted (6-18)
src/apis/evaljobscheduler.ts (1)
  • EvalJobScheduler (24-173)
src/apis/eval.ts (1)
  • EvalAPI (89-465)
src/utils/promptMetadata.ts (1)
  • PromptAttributes (14-17)
🔇 Additional comments (11)
.gitignore (1)

6-7: LGTM!

Adding .DS_Store to the ignore list is a sensible addition for macOS development environments, preventing system metadata files from being accidentally committed.

package.json (1)

82-82: @clickhouse/client is not currently imported or used in the codebase.

The verification found no source code imports of @clickhouse/client anywhere in the repository. The package appears only in package.json (line 82) and package-lock.json as a direct dependency, but no actual usage exists.

The original review concerns about server-side guards, browser bundling impact, and dynamic import patterns are not applicable until the package is actually used in code. If this package was added as a placeholder for future use, import it first; if it was added accidentally, remove it.

No security vulnerabilities were found for the package.

Likely an incorrect or invalid review comment.

src/apis/api.ts (2)

32-37: ServiceName union: looks good

The union cleanly adds 'eval' and propagates through call sites.


110-113: Eval base URL case: OK

Resolves AGENTUITY_EVAL_URL with sensible fallback to transport URL.

src/server/bun.ts (1)

68-70: Scheduler init at startup: good

Singleton created early; avoids lazy-init races.

src/server/node.ts (1)

110-114: Scheduler init at startup: good

Keeps singleton alive across requests.

test/utils/hash.test.ts (1)

1-61: Hash tests: solid coverage

Parities and edge cases covered; assertions are precise.

src/utils/promptMetadata.ts (2)

11-11: Evals field propagation looks good

Adding evals?: string[] to PromptAttributesParams is reasonable and non-breaking.


38-42: Good: centralized hashing via hashSync

Using hashSync for template/compiled hashes improves consistency and testability.

src/router/context.ts (1)

211-221: Confirm which hash EvalAPI expects (template vs compiled)

You pass templateHash to runEval(promptHash?). Should this be compiledHash instead? Verify API semantics for downstream correlation.

Would you like me to scan call sites and API docs to confirm and update accordingly?

src/utils/hash.ts (1)

1-1: Review comment is incorrect; no changes needed.

The package description explicitly states this is "The Agentuity SDK for NodeJS and Bun" with Node.js >= 22 as the minimum version. In Node.js 22, import crypto from 'crypto' correctly exposes crypto.subtle, and the current implementation is valid. The review comment incorrectly assumes browser compatibility is required and suggests unnecessary changes (globalThis fallback) for a server-only SDK. The codebase is correct as-is.

Likely an incorrect or invalid review comment.

Comment thread package.json Outdated
Comment thread src/apis/api.ts Outdated
Comment thread src/apis/eval.ts
Comment thread src/apis/eval.ts Outdated
Comment thread src/apis/eval.ts
Comment thread src/apis/evaljobscheduler.ts
Comment thread src/index.ts Outdated
Comment thread src/router/context.ts
Comment thread src/router/router.ts
Comment thread src/server/node.ts Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
src/apis/evaljobscheduler.ts (2)

119-149: Logging uses debug level appropriately.

The method logs job details at internal.debug level throughout, which is appropriate for detailed operational data. This addresses the concern from the previous review about info-level logging.


154-168: printState still dumps full jobs array with potential sensitive data.

The method continues to serialize complete job objects without redaction. Prompt metadata may contain sensitive inputs, outputs, or other PII.

Apply the previously suggested fix:

-	public printState(): void {
-		internal.debug('🔍 EvalJobScheduler.printState() called');
-		internal.debug('🔍 Instance ID:', this.instanceId);
-		internal.debug('🔍 EvalJobScheduler State:');
-		internal.debug('📊 Total jobs:', this.pendingJobs.size);
-		internal.debug('📋 Job IDs:', Array.from(this.pendingJobs.keys()));
-		internal.debug(
-			'📦 All jobs:',
-			JSON.stringify(Array.from(this.pendingJobs.values()), null, 2)
-		);
-		internal.debug(
-			'🔍 Global instance check:',
-			globalThis.__evalJobSchedulerInstance === this
-		);
-	}
+	public printState(): void {
+		internal.debug('🔍 EvalJobScheduler.printState() called');
+		internal.debug('🔍 Instance ID:', this.instanceId);
+		internal.debug('🔍 EvalJobScheduler State:');
+		internal.debug('📊 Total jobs:', this.pendingJobs.size);
+		internal.debug('📋 Job IDs:', Array.from(this.pendingJobs.keys()));
+		if (process.env.AGENTUITY_DEBUG_EVALS === '1') {
+			internal.debug(
+				'📦 All jobs (redacted):',
+				JSON.stringify(
+					Array.from(this.pendingJobs.values()).map(j => ({
+						spanId: j.spanId,
+						sessionId: j.sessionId,
+						promptMetadataCount: j.promptMetadata?.length ?? 0,
+						hasOutput: !!j.output,
+						createdAt: j.createdAt,
+					})),
+					null,
+					2
+				)
+			);
+		}
+		internal.debug(
+			'🔍 Global instance check:',
+			globalThis.__evalJobSchedulerInstance === this
+		);
+	}
src/router/context.ts (1)

82-86: Zero-promises branch still missing markSessionCompleted call.

The early-return path executes evals but never marks the session completed, leaving the session in an incomplete state from a tracking perspective. This inconsistency with the Promise.all branch (line 96) means metrics and downstream consumers won't see the session end.

Apply this fix:

 		if (this.promises.length === 0) {
 			internal.debug('No promises to wait for, executing evals directly');
+			await markSessionCompleted(sessionId, 0);
 			await this.executeEvalsForSession(logger, sessionId);
 			return;
 		}
🧹 Nitpick comments (2)
src/apis/evaljobscheduler.ts (1)

40-60: Consider making getInstance() synchronous.

The method is declared async but performs no asynchronous operations. Unless async initialization is planned for the future, this could be simplified to a synchronous method returning EvalJobScheduler directly.

If desired, apply this diff:

-	public static async getInstance(): Promise<EvalJobScheduler> {
+	public static getInstance(): EvalJobScheduler {
 		internal.debug('🔍 EvalJobScheduler.getInstance() called');
 		internal.debug(
 			'🔍 globalThis.__evalJobSchedulerInstance exists:',
 			!!globalThis.__evalJobSchedulerInstance
 		);

 		if (!globalThis.__evalJobSchedulerInstance) {
 			globalThis.__evalJobSchedulerInstance = new EvalJobScheduler();
 			internal.debug(
 				'🆕 Created new EvalJobScheduler instance, ID:',
 				globalThis.__evalJobSchedulerInstance.instanceId
 			);
 		} else {
 			internal.debug(
 				'♻️ Returning existing EvalJobScheduler instance, ID:',
 				globalThis.__evalJobSchedulerInstance.instanceId
 			);
 		}
 		return globalThis.__evalJobSchedulerInstance;
 	}
src/router/context.ts (1)

112-174: Consider reusing EvalAPI instance or document stateless design.

A new EvalAPI instance is created for each session (line 138). If EvalAPI is stateless and lightweight, this is fine. Otherwise, consider reusing a single instance or caching the evalMetadataMap to avoid repeated file system scans.

Also, ensure cleanup failures are non-critical: the removeJob() loop (lines 167-169) doesn't check return values, which is acceptable if job removal is best-effort.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fbb376 and 0dd2845.

📒 Files selected for processing (4)
  • src/apis/api.ts (2 hunks)
  • src/apis/eval.ts (1 hunks)
  • src/apis/evaljobscheduler.ts (1 hunks)
  • src/router/context.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/apis/eval.ts
  • src/apis/api.ts
🧰 Additional context used
📓 Path-based instructions (4)
src/router/**

📄 CodeRabbit inference engine (AGENT.md)

Router code (request/response handling, streaming) resides in src/router/

Files:

  • src/router/context.ts
{src,test}/**/!(*.d).ts

📄 CodeRabbit inference engine (AGENT.md)

{src,test}/**/!(*.d).ts: Use strict TypeScript and prefer unknown over any
Use ESM import/export syntax; avoid CommonJS require/module.exports
Use relative imports for internal modules
Keep imports organized (sorted, no unused imports)
Use tabs with a visual width of 2 spaces
Limit lines to a maximum of 80 characters
Use single quotes for strings
Use proper Error types; do not throw strings
Prefer template literals over string concatenation

Files:

  • src/router/context.ts
  • src/apis/evaljobscheduler.ts
src/apis/**

📄 CodeRabbit inference engine (AGENT.md)

Place core API implementations under src/apis/ (email, discord, keyvalue, vector, objectstore)

Files:

  • src/apis/evaljobscheduler.ts
src/apis/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/code-generation.mdc)

src/apis/**/*.ts: Do not hardcode generated prompt content (e.g., copyWriter) in source files; load it dynamically
Avoid overly complex TypeScript generics for generated content; prefer simple, maintainable types
Maintain type safety for dynamically loaded content by generating and referencing TypeScript definitions, with proper annotations for require() results
Do not use relative imports to generated artifacts; resolve via absolute node_modules paths or the package entry
Generated content is loaded at runtime, not build time; avoid static imports of generated modules
Prefer bracket notation for accessing slug-named properties with hyphens (e.g., prompts['slug-name'])
Avoid relative require('./generated/_index.js'); resolve absolute paths from process.cwd() into node_modules for generated assets

Files:

  • src/apis/evaljobscheduler.ts
🧬 Code graph analysis (1)
src/router/context.ts (4)
src/apis/session.ts (1)
  • markSessionCompleted (6-18)
src/apis/evaljobscheduler.ts (1)
  • EvalJobScheduler (24-176)
src/apis/eval.ts (1)
  • EvalAPI (89-465)
src/utils/promptMetadata.ts (1)
  • PromptAttributes (14-17)
🔇 Additional comments (4)
src/apis/evaljobscheduler.ts (2)

65-104: LGTM: Job creation logic is sound.

The method correctly creates or overwrites jobs, counts evals from metadata, and maintains consistent state. Debug logging provides good observability.


109-114: LGTM: Removal logic is straightforward.

The method correctly removes jobs and returns the deletion status.

src/router/context.ts (2)

1-12: LGTM: Import additions are appropriate.

The new imports (trace, EvalAPI, EvalJobScheduler, internal logger, PromptAttributes) properly support the eval execution functionality added to this file.


88-107: LGTM: Promise.all branch properly sequences completion and eval execution.

The flow correctly waits for all promises, marks the session completed with accurate duration, executes evals post-completion, and handles cleanup.

Comment thread src/apis/evaljobscheduler.ts
Comment thread src/router/context.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/apis/eval.ts (2)

29-33: Replace any with unknown in index signature.

This issue was already flagged in a previous review. The index signature should use unknown to align with strict TypeScript guidelines, and the biome-ignore comment should be removed.


317-331: Improve HTTP status validation to check 2xx range.

The current check if (!resp.status) only verifies that a status exists, not whether it indicates success. A proper HTTP status check should validate the 2xx range (200-299).

Apply this diff:

-if (!resp.status) {
-	internal.debug('Failed to update the database with the eval run', {
+if (!resp || typeof resp.status !== 'number' || resp.status < 200 || resp.status >= 300) {
+	internal.debug('Failed to update the database with the eval run', {
 		status: resp.status,
 		evalName,
 		evalId: r.evalId,
 		sessionId: r.sessionId,
 	});
 } else {
 	internal.debug('Eval run updated in the database', {
 		status: resp.status,
 		evalName,
 		evalId: r.evalId,
 		sessionId: r.sessionId,
 	});
 }
🧹 Nitpick comments (1)
src/apis/eval.ts (1)

111-158: Consider extracting shared logic to reduce duplication.

Both loadEvalByName and loadEvalById share nearly identical file-scanning, filtering, and dynamic-import logic. Only the metadata matching criterion differs (slug vs. id). Extracting a private helper method would improve maintainability.

Example refactor:

private async findEvalByPredicate(
	predicate: (metadata: any) => boolean
): Promise<{
	evalFn: EvalFunction;
	metadata: { id: string; slug: string; name: string; description: string };
}> {
	const files = fs.readdirSync(this.evalsDir);
	const expectedExt = this.isBundled ? '.js' : '.ts';

	for (const file of files) {
		if (file === 'index.ts' || file === 'index.js') continue;
		if (!file.endsWith(expectedExt)) continue;

		const filePath = path.join(this.evalsDir, file);
		try {
			const fileUrl = pathToFileURL(filePath).href;
			const module = await import(fileUrl);

			if (module.metadata && predicate(module.metadata)) {
				return { evalFn: module.default, metadata: module.metadata };
			}
		} catch (error) {
			internal.debug(`Skipping file ${file} due to import error: ${error}`);
		}
	}
	throw new Error('No eval found matching predicate');
}

async loadEvalByName(evalName: string) {
	return this.findEvalByPredicate(m => m.slug === evalName);
}

async loadEvalById(evalId: string) {
	return this.findEvalByPredicate(m => m.id === evalId);
}

Also applies to: 164-211

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0dd2845 and 55860f0.

📒 Files selected for processing (2)
  • src/apis/eval.ts (1 hunks)
  • src/apis/patchportal.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
src/apis/**

📄 CodeRabbit inference engine (AGENT.md)

Place core API implementations under src/apis/ (email, discord, keyvalue, vector, objectstore)

Files:

  • src/apis/patchportal.ts
  • src/apis/eval.ts
{src,test}/**/!(*.d).ts

📄 CodeRabbit inference engine (AGENT.md)

{src,test}/**/!(*.d).ts: Use strict TypeScript and prefer unknown over any
Use ESM import/export syntax; avoid CommonJS require/module.exports
Use relative imports for internal modules
Keep imports organized (sorted, no unused imports)
Use tabs with a visual width of 2 spaces
Limit lines to a maximum of 80 characters
Use single quotes for strings
Use proper Error types; do not throw strings
Prefer template literals over string concatenation

Files:

  • src/apis/patchportal.ts
  • src/apis/eval.ts
src/apis/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/code-generation.mdc)

src/apis/**/*.ts: Do not hardcode generated prompt content (e.g., copyWriter) in source files; load it dynamically
Avoid overly complex TypeScript generics for generated content; prefer simple, maintainable types
Maintain type safety for dynamically loaded content by generating and referencing TypeScript definitions, with proper annotations for require() results
Do not use relative imports to generated artifacts; resolve via absolute node_modules paths or the package entry
Generated content is loaded at runtime, not build time; avoid static imports of generated modules
Prefer bracket notation for accessing slug-named properties with hyphens (e.g., prompts['slug-name'])
Avoid relative require('./generated/_index.js'); resolve absolute paths from process.cwd() into node_modules for generated assets

Files:

  • src/apis/patchportal.ts
  • src/apis/eval.ts
🧬 Code graph analysis (1)
src/apis/eval.ts (1)
src/apis/api.ts (1)
  • POST (253-270)
🔇 Additional comments (1)
src/apis/patchportal.ts (1)

17-17: LGTM! Good migration from deprecated substr to substring.

The behavior is identical—both extract 9 characters starting at position 2—and this change aligns with modern JavaScript standards.

Comment thread src/apis/eval.ts
@potofpie potofpie changed the title Eval handler Add eval runner support Oct 22, 2025
@potofpie potofpie merged commit 95347fe into main Oct 22, 2025
3 checks passed
@potofpie potofpie deleted the eval-handler branch October 22, 2025 18:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants